Rank | Count | Beginning |
---|---|---|
13610 | 1423 | Hän |
70267 | 1145 | Se |
23847 | 1072 | Jos |
6485 | 880 | Ei |
54615 | 837 | Nyt |
34820 | 726 | Kun |
49835 | 666 | Myös |
48675 | 632 | Mutta |
21401 | 629 | Ja |
71131 | 606 | Sen |
80414 | 547 | Tämä |
40620 | 466 | Lisäksi |
57437 | 450 | On |
50779 | 419 | Näin |
13879 | 407 | Hänen |
77140 | 401 | Suomen |
10444 | 361 | Esimerkiksi |
97859 | 354 | Yhtiön |
90935 | 334 | Vaikka |
94947 | 331 | Viime |
8620 | 325 | En |
19843 | 318 | Ilmoita |
56130 | 313 | Olen |
72787 | 288 | Seuratuimmat |
26593 | 287 | Kaikki |
36309 | 260 | Kuva: |
97756 | 253 | Yhtiö |
16159 | 247 | He |
36794 | 247 | Kyllä |
80609 | 242 | Tämän |
In the next four subsections show the most frequent sentence beginnings consisting of N words, N=1, 2, 3, 4. In this subsection we start with N=1.
The most frequent word-N-grams at the beginning of sentences give some insight into sentence composition.
Especially for N=1, we only need a small corpus to identify the most frequent sentence beginnings.
select substring_index(sentence, ' ', 1) as beg, count(*) as cnt from sentences group by substring_index(sentence, ' ', 1) order by cnt desc limit 50;
4.3.1.2 Most Frequent Sentence Beginnings II
4.3.1.3 Most Frequent Sentence Beginnings III
4.3.1.4 Most Frequent Sentence Beginnings IV
4.3.1.1 Most Frequent Sentence Endings I
4.3.1.2 Most Frequent Sentence Endings II
4.3.1.3 Most Frequent Sentence Endings III
4.3.1.4 Most Frequent Sentence Endings IV